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Abstract 

We prove two main results on how arbitrary linear threshold functions f(x) = 
sign(u> ■ x — 9) over the n-dimensional Boolean hypercube can be approximated by 
simple threshold functions. 

Our first result shows that every n-variable threshold function / is e-close to a 
threshold function depending only on Inf(/) 2 • poly(l/e) many variables, where Inf(/) 
denotes the total influence or average sensitivity of /. This is an exponential sharpening 
of Friedgut's well-known theorem |Fri98j . which states that every Boolean function / 
is e-close to a function depending only on 2°( Inf (^/ e ) many variables, for the case of 
threshold functions. We complement this upper bound by showing that f2(Inf(/) 2 + 
1/e 2 ) many variables are required for e-approximating threshold functions. 

Our second result is a proof that every n-variable threshold function is e-close to a 
threshold function with integer weights at most poly(n) • 2 ( ^ 1//<e2/3 * ) . This is a significant 
improvement, in the dependence on the error parameter e, on an earlier result of [Ser07] 
which gave a poly(n) • 2°( 1 / e2 ) bound. Our improvement is obtained via a new proof 
technique that uses strong anti-concentration bounds from probability theory. The 
new technique also gives a simple and modular proof of the original |Ser07] result, 
and extends to give low-weight approximators for threshold functions under a range of 
probability distributions beyond just the uniform distribution. 
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1 Introduction 



Linear threshold functions (henceforth simply called threshold functions) are functions / : 
{ — 1, 1}" — > {—1, 1} of the form f(x) = sign(u> ■ x — 0) where the weights w±, . . . ,w n and 
the threshold 9 may be arbitrary real values. Threshold functions are a fundamental type 
of Boolean function and have played an important role in computer science for decades, 
see e.g. |Der65l IMur71t ISRK95] . Recent years have witnessed a flurry of research activity 
on threshold functions from many perspectives of theoretical computer science, including 
hardness of learning {FGKP06, KS08J, efficient learning algorithms in various models |Kal07 



IOS081 [KKMS08] . property testing [MORS091 IGS07] , communication complexity and circuit 
complexity [She07] . monotone computation |B W06] . derandomization |RS09l |DGJ + 09| . and 
more. 

Despite their seeming simplicity threshold functions can have surprisingly rich structure, 
and basic questions about them can be unexpectedly challenging to answer. As one exam- 
ple, a moment's thought shows that every threshold function / can be realized with integer 
weights W\, . . . ,w n : how large do those integer weights need to be? A fairly straightfor- 
ward argument gives a bound of 2°( nl °s n ), but while this upper bound was known at least 
since 1961 [MTT61J and rediscovered several times (e.g. (Hon87t Rag88| ) , more than thirty 
years elapsed before a matching lower bound of 2 n(nlogn ) was finally obtained via a fairly 
sophisticated construction and proof [Has9~4t IAV97] . 

This paper is about approximating arbitrary threshold functions using "simple" threshold 
functions, meaning ones that depend on few variables or have small integer weights. We use 
a natural notion of approximation with respect to the uniform distribution: throughout the 
paper u h is an e-approximator for /" means that Pr[/i(x) ^ f{%)] < e- (All probabilities and 
expectations over x G {—1, l} n are taken with respect to the uniform distribution, unless 
otherwise specified. In Section H] we shall consider more general notions of approximation 
with respect to other distributions as well.) We prove two main results about approximating 
threshold functions, which we motivate and describe below. 



1.1 First main result: optimally approximating threshold func- 
tions by juntas. 

The influence of coordinate ion/: {— 1, l} n — > { — 1,1} is Infj(/) = f ~Pr[f(x) ^ f(x m )], 
where x® 1 denotes x with the z-th bit flipped. The total influence of /, written Inf(/), 
is Y2i^i(f)'i it is a normalized measure of the fraction of edges in the hypercube that are 
rendered bichromatic by /, and is equal to the "average sensitivity" of /. It is well known (see 
[FK96j or |BT96] for an explicit proof) that every threshold function has Inf(/) < y/n, and 
that the majority function on n variables achieves Inf(/) = Q(y/n) - and in fact maximizes 
Inf(/) over all threshold (or even all unate) functions. 
In |Fri98j . Friedgut proved the following: 

Theorem. $Fri98\f Every Boolean function f is e- approximated by a 2° < - Inf( ^ //e * ) -junta, i.e. a 
function depending only on 2°^^^^ of the n input variables. 

Friedgut's theorem is an important structural result about boolean functions and has been 
usefully applied in several areas of theoretical computer science, including hardness of approx- 
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imation (DS051 ICKK+061 IKR08j . metric embeddings |KR06j . and learning theory [OS07] . In 
Section 12.51 we discuss the role of this theorem in a sequence of results on the Fourier repre- 
sentation of Boolean functions. 

Friedgut showed that his bound is best possible for general Boolean functions, by giving 
an explicit family of functions which require 2 n ( Inf( ^ //<E ' ) -juntas for any e-approximation. A 
bound of the form 2°( Inf ^/ e ) is of course nontrivial only if Inf(/) <C logn, which is rather 
small; thus, it is natural to ask whether various restricted classes of functions, such as 
threshold functions, might admit stronger bounds. 

Our first main result is an exponentially stronger version of Friedgut 's theorem for thresh- 
old functions: 

Theorem 1 (First Main Theorem). Every threshold function f is e- approximated by an 
Inf(/) 2 • poly (1/e) -junta (which is itself a threshold function) . 

This bound is essentially optimal; easy examples show that f2(Inf(/) 2 + 1/e 2 ) many vari- 
ables may be required for e-approximation (see Section 1231) . We conjecture that Theorem [1] 
extends to degree-o? polynomial threshold functions with an exponential dependence on d 
in the bound, and also conjecture a different extension of Theorem [1] that is inspired by a 
theorem of Bourgain; see Section 12.51 

Techniques. The proof of Friedgut's theorem makes essential use of the Bonami-Gross- 
Beckner hypercontractive inequality |Bon70l [Gro75l IBec75j . Our proof of Theorem [T] takes a 
completely different route and does not use hypercontractivity; instead, the main ingredients 
are recent Fourier results on threshold functions from [OS08] and a probabilistic construc- 
tion which is reminiscent of Bruck and Smolensky's randomized construction of polynomial 
threshold functions [BS92J. 

In more detail, a key notion in our proof is that of a regular threshold function; roughly 
speaking, this is a threshold function where each of the weights Wi is "small" relative to the 
2-norm of the weight vector. Given a regular threshold function g = sign(w • x — 9), we 
use the weights Wi to define a probability distribution over approximators to g (this is done 
similarly to [BS92J). We show (L emmas [8] and [9]) that a randomly drawn approximator from 
this distribution has high expected accuracy and does not depend on too many variables 
(the upper bound is given in terms of the weights Wi and the regularity parameter). 

An obvious problem in using this construction to approximate arbitrary threshold func- 
tions is that not every threshold function is regular. To get around this, we use a recent 
result from [OS08J which shows that every threshold function / can be well approximated by 
a threshold function /' which has two crucial properties: /' is almost regular (in the sense 
that it only has a few "large" weights), and its "small" weights are (appropriately scaled 
versions of) the influences of the corresponding variables in /. For each restriction p that fixes 
the large- weight variables of /', then, we may use f'\ p as the regular threshold function g of 
the previous paragraph, and we obtain a distribution over approximators to f'\ p where the 
number of relevant variables for each such approximator is at most Inf(/) 2 ■ poly(l/e). From 
this, using the probabilistic method, we are able to argue that there is a single high-accuracy 
approximator for / that depends on at most Inf(/) 2 • poly(l/e) variables, as required. 
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1.2 Second main result: approximating threshold functions to 
higher accuracy. 

The second main result of this paper is about approximating an arbitrary n- variable thresh- 
old function / using a threshold function g with small integer weights. Goldberg [G0IO6J 
and Servedio |Ser07] have observed that, because of the 2 n(nlogn ) lower bound |Has94j on 
integer weights to exactly represent arbitrary threshold functions, it is not possible in gen- 
eral to construct an e-approximator g with integer weights poly(n, 1/e). Servedio |Ser07] 
gave the first positive result, by showing that for every threshold function / there is an e- 
approximating threshold function g in which each weight is an integer of magnitude at most 
poly(n) • 2°( 1 ' e '. This result and the ingredients in its proof have since played an important 
role in subsequent work on threshold functions, e.g. |OS08l IMORS091 lDGJ + 09j. 

Given the usefulness of |Ser07j and the poor dependence on e in its bound, it is natural 
to seek a stronger quantitative bound with a better dependence on e; in fact, this was posed 
as a main open question in [Ser07j . Our second main result makes progress in this direction: 

Theorem 2 (Second Main Theorem). Every n-variable threshold function f is e- approximated 
by a threshold function g = sign(w • x — 6) with Wi, . . . , w n all integers of magnitude n 3//2 ■ 

2 0(l/e 2 / 3 ) 

Another question posed in [Ser07| asked about small integer-weight approximators with 
respect to other probability distributions beyond just the uniform distribution. As described 
below, Theorem [2] can be generalized to hold under a range of non-uniform distributions. 

Theorem [2] is proved using a new approach which we believe may lead to better bounds 
for a range of problems considered in [OS08[ IMORS09| lDGJ + 09] which use the approach 
from [Ser07| . Roughly speaking, the proof in [Ser07] and the applications in |OS08t lMORS09 , 
|DGJ + Q9] all rely on the fact that for suitable weight vectors w, the random variable w ■ x 
(with x uniform over { — 1, l} n ) can be approximated by a Gaussian. Such approximation 
provides a great deal of information about w ■ x, but the drawback is that the Gaussian 
is only a fairly coarse approximator of w ■ x even for a weight vector as well-behaved as 
w = (1, . . . , 1), and this inevitably seems to lead to bounds that are exponential in 1/e 2 
as m [Ser07llOS08llDGJ + 09] l We now briefly describe how our new approach that yields 
Theorem [2] gets around this barrier. 

Techniques. The main conceptual difference between our new approach and the approach 
in |Ser07j is this. The proof in [Ser07j starts with an arbitrary vector of weights that represent 
some threshold function; intuitively this could be problematic because these weights may 
provide an inconvenient representation to work with for the underlying function. In contrast, 
we focus on the function itself and prove that every threshold function has a "nice" weight 
vector that represents it. This allows us to exploit strong anti-concentration bounds |Hal77] 
that apply only under certain assumptions on the weights; we elaborate below. 

The notion of anti-concentration is an important ingredient in our approach: a random 
variable has good anti-concentration if it does not assign too much mass to any small interval 
of the real line. The study of anti-concentration has a rich history in probability theory, see 
e.g. [DL36[ lKol60t IEss68| |Rog73"l IRV08| . Anti-concentration inequalities for discrete random 



variables of the form w ■ x are known to be significantly more delicate than concentration 
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inequalities (i.e. "tail bounds"): while concentration typically depends on the 2-norm of w, 
anti-concentration depends on the additive structure of the coefficients in a subtle way0 

We remark that |Ser07] also (implicitly) uses anti-concentration bounds, in particular 
ones based on Gaussian approximation (that follow from the Berry-Esseen Theorem; see 
Theorem Hj). In hindsight it can be seen that no stronger anti-concentration bounds can be 
used in the arguments of |Ser07] because that proof considers all possible representations of 
the form sign(iy -x — 9), where w ranges over all of lR n . As an example, consider the majority 
function. For the standard representation as sign(^ i Xi), the anti-concentration bound given 
by the Berry-Esseen Theorem is the best possible, since an arbitrarily small interval that 
contains the origin has probability mass Vt{l/^/n). On the other hand, it is possible to 
come up with alternate representations sign(w -x) for the majority function that have better 
anti-concentration; this is essentially what our proof does. We prove a structural theorem 
which states that every threshold function has a representation in which "many" weights are 
"well-separated;" under this condition on the weights, we obtain strong anti-concentration 
using a result of of Halasz [Hal77j. Finally, we show that strong anti-concentration yields 
low- weight integer approximation to get our final desired result. 

Discussion: Our general approach is both modular and robust. It yields a simple and 
modular proof of the poly(n) • 2°( 1 / <E ' upper bound from [Ser07| which was proved there 
via a rather elaborate case analysis. More importantly, the new poly(n) • 2°( 1 ^ 2/3 ) bound 
and its proof generalize easily to a wide range of distributions. These include constant- 
biased product distributions and, using the recent result of [DGJ+09] . all K -wise independent 
distributions for sufficiently large K (K = 0(l/e 2 ) suffices for e-approximation). 

Organization. We prove Theorem [1] in Section [2] and Theorem [2] in Section [3j Section H] 
contains the extension of Theorem [2] to certain nonuniform distributions. 

2 Theorem Wl- Optimally approximating threshold func- 
tions by juntas 

This section is structured as follows: after giving some mathematical preliminaries, in Sec- 
tion 12.21 we describe a randomized construction of approximators for regular threshold func- 
tions. In Section [2T31 we recall the result from |OS08] that lets us approximate any threshold 
function by a threshold function that is "almost" regular. In Section I2~4l we put these pieces 
together to prove Theorem [TJ We give some discussion and conjectures in Sections 12.51 

2.1 Preliminaries. 

2.1.1 Basic Probabilistic Inequalities. 

We first recall the following standard additive Hoeffding bound: 



1 Roughly speaking, if one forbids more and more additive structure in the w^s, then one gets better and 
better anti-concentration; see e.g. |Vu081 ITV08] and Chapter 7 of |TV06j . 
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Theorem 3. Let X%, . . . ,X n be independent random variables such that for each j G [n], 
Xj is supported on [dj,bj] for some aj,bj G R, aj < bj. Let X = ^2™ =1 Xj. Then, for any 
t>0, 

( -2t 2 

Pr[\X-E[X}\ >t] <2exp 



£J=i(&i 



\2 



The Berry-Esseen theorem is a version of the Central Limit Theorem with explicit error 
bounds: 

Theorem 4. (Berry-Esseen) Let X±, . . . , X n be independent random variables satisfying 
E[Xi] = for all i e [n\, ^ E[X 4 2 ] = a, and £. E[|X,| 3 ] = p 3 . Let S = (X, + ■ ■ ■ +X n )/a 
and let F denote the cumulative distribution function (cdf) of S. Then 

sup \F(x)-$(x)\< Cp 3 /a 3 , 

X 

where $ is the cdf of a standard Gaussian random variable, and C is a universal constant. 
JShi8by has shown that one can take C = .7915. 

Corollary 5. Let x±, . . . , x n denote independent uniformly random ±1 signs and letwi, . . . ,w n G 
R. Write a = a/Sj w 1j an< ^ assume \wi\/a < r for all % G [n\. Then for any interval 
[a, b]CR, 

\Pr[a < Wl xi + ■ ■ ■ + w n x n < b] - , J]) | < 2r, 

def 

where $([c, d]) = <&(d) — $(c). In particular, 

\b — a\ 

Pr[a < wiXi + ■ • • + w n x n < b\ < h 2r. 

a 

2.1.2 Fourier Analysis over { — 1, l} n . 

We consider functions /:{— 1,1}"— >R (though we often focus on Boolean- valued functions 
which map to { — 1,1}), and we think of the inputs x to / as being distributed according 
to the uniform probability distribution. The set of such functions forms a 2 n -dimensional 
inner product space with inner product given by (/, g) = E[f(x)g(x)}. The set of functions 
(Xs)sc[n] defined by Xs( x ) — Yli^s x i f° rms a complete orthonormal basis for this space. We 
will often simply write x$ for Hies' 2 '*- 

Given a function / : {— l,l} n — > R we define its Fourier coefficients by f(S) == 
E,[f(x)xs], and we have that f(x) = J2s f(S) x s- We refer to the maximum \S\ over all 
nonzero f(S) as the Fourier degree of /. When \S\ = 1 we usually abuse notation and write 
f(i) instead of /({*}). 

As an easy consequence of orthonormality we have Plancherel's identity (/, g) = Y2s f{S)g^(S), 
which has as a special case ParsevaVs identity, E[/(x) 2 ] = ^2 s f(S) 2 . From this it follows 
that for every / : {-1, l} n -> {-1, 1} we have ^2 s f(S) 2 = 1. 

We recall the well-known fact (see e.g. [KKL88] ) that the total influence Inf(/) of any 
Boolean function equals ^5 /(5') 2 |5'|- Moreover, for every threshold function / (in fact for 
every unate function), we have that Infj(/) = |/(i)|. 
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2.1.3 Other Technical Preliminaries. 

A function / : { — 1, 1}™ — > { — 1, 1} is said to be a "junta on J C [n]" if / only depends on 
the coordinates in J . As stated earlier, we say that / is a J-junta, < J < n, if it is a junta 
on some set of cardinality at most J. For a vector u £ W 71 we write to denote the L\ 
norm of u, i.e. \\u\\i = YliLi \ u i\- We write "A <— 2?" to indicate that random variable A is 
distributed according to distribution V. 

Finally, we give a precise definition of the notion of a "regular" threshold function: 

Definition 6. Let f(x) = sign(w + Y^=i w i x i) be a threshold function where EILi w f = 1- 
We say that f is r-regular if \vji\ < r for all i E [n]H 

2.2 Randomly constructing approximators to regular threshold 
functions. 

Fix hg(x) = sign(0 + YlT=i u % x i) to be a r-regular threshold function, so Y^=x u 1 = 1 an d 
< r for all i 6 [m]. Our notation emphasizes the threshold parameter 9 since it will play 
an important role later. 

We begin by defining a distribution T> over linear forms L(x) = YliLi °i x i- The distribu- 
tion T> is defined using the weights Ui similarly to how Bruck and Smolensky [BS92] define 
a distribution over polynomials using the Fourier coefficients of a Boolean function. A draw 
of L(x) from T> is obtained as follows: L(x) is first initialized to 0. Then the following is 

independently repeated N = 0(||w||i • \ • m (l/ r )) times: an index i £ [m] is selected with 
probability and sign(ui)xi is added to L(x). 

Fix any z £ { — 1, l} m . For L <— V, we may view L(z) as a sum of N i.i.d. ±l-valued ran- 
dom variables Z\(z), . . . , Z^(z), where the expectation of each Zj(z) is J2iLi \^ s ^S n ( u i) z i 

v ■ z. We thus have: 

IMIi 

E M [L(z)] = E E[^(z)] = ^(ti • s). (1) 
i=i IpIIi 

With T> in hand we define a distribution X>' over threshold functions gg in the following 
natural way: to draw a function g d <— P' we draw L <— X> and set 

^(x) = sign(^ + MiL(x)). (2) 

We would like to show that for g e <— V, the probability that g$(z) disagrees with h$(z) 
is "small," i.e. at most 0(t). But such a bound cannot hold for every z £ {—1, for if 
the value of 9 + u ■ z is arbitrarily close to then the expected value of the argument to sign 
in d2J) may be arbitrarily close to 0. For z such that 9 + u ■ z is not too close to 0, though, 
it is possible to argue that gg(z) is incorrect only with small probability (over the draw of 

2 Strictly speaking, r-regularity is a property of a particular representation sign(u;o + E™=i w i x i) an d n °t 
of a threshold function /, which could have different representations some of which are r-regular and some 
of which are not. The particular representation we are concerned with will always be clear from context. A 
similar remark holds for Definition [7] 
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go <— T>'). Moreover, the regularity of hg lets us argue that only a small fraction of inputs z 
have 9 + u ■ z close to 0, so we can conclude that the expected error of ge is low. We now 
provide the details. 

We will use the following notion of the "margin" of an input relative to a threshold 
function: 

Definition 7. Let f(x) = sign(u>o + ^r=i w i x i) be a threshold function where the weights are 
scaled so that Yli=i w i = 1- Given a particular input z £ {—1, l} n we define marg(/, z) = 

\ w + J2i=l w i z i\- 

Let MARG eiT = f {z £ {-1, l} m : maxg(h e ,z) > r} denote the set of points in {-1, l} m 
with margin at least r under hg. We now show that a random g$ <— V has high expected 
accuracy on each point z £ MARGe )T : 

Lemma 8. For each z £ MARGe iT we have Pr ge ^x>'[hg{z) ^ 9e{z)\ < T - Moreover, each 
ge <— T>' is an N -junta. 

Proof. The latter claim is immediate so it suffices to prove the former. Fix any z £ M ARGg jT , 
so \9 + u • z\ > r. We need to bound from above the probability of the "bad" event (over 
the random choice of g$ <— V) that hg(z) ^ ge(z); we refer to this bad event as B. 

The key claim is that if B occurs then it must be the case that \L(z) — Ei^-d[L(z)]\ > 
pr. For suppose that hg(z) = 1 and gg(z) = —1 (the other case is handled similarly). 
By definition, we have that 6 + u ■ z > and 9 + (\\u\\i/N)L(z) < 0. Since z belongs 
to MARGe jT , the first inequality gives that 9 + u ■ z > r, which implies, via that 
E[L(z)] > (A^/||u||i)(r — 9). The second inequality is equivalent to L(z) < —9N/\\u\\i, and 
consequently we have E[L(z)] — L(z) > iVr/||w||i. 

We thus have that Pr g9 ^ v ,[h d (z) ^ g 9 (z)\ < Pr L ^ v [\L(z) - E[L(z)]\ > jgM. Now we 
again view L(z) as the sum of i.i.d. { — 1,1} random variables. The Hoeffding bound 
(Theorem [3]) yields 

Pr L ^[\L(z) -E[L(z)]\ >^]< 2exp ("2^^) < r, 

where the second inequality follows by our choice of N. This completes the proof of the 
lemma. □ 

We next note that by the regularity of hg, most points in { — 1, l} m have a large margin 
(and hence are covered by Lemma [8]): 

Lemma 9. Pr xe{ _ 1)1} m[x MARG^] < At. 

Proof. The proof is a consequence of regularity via the Berry-Esseen theorem (see Sec- 
tion [5TTTT]) ; it follows directly by applying (the last statement of) Corollary noting that 

Combining Lemmas [8] and [9J we get the main result of this subsection: 
Lemma 10. E gg ^ vl [Pr xe{ _ 1A}m [g e (x) ^ hg(x)}} < 5r. 
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2.3 Approximating threshold functions using their influences as 
(almost all of) the weights. 

Our next tool is the following theorem on approximating threshold functions. Roughly, it 
says that every threshold function / can be well approximated by a threshold function /' 
where all but the poly(l/e) largest weights of /' have a special structure: up to sign, they 
are the values mfj(/). (Recall that for a threshold function / we have \f(i)\ = Infj(/); see 
Section HU ) 

Theorem 11. [Theorem 17 of lOSOSf ] There is a fixed polynomial n(e) = 0(e 144 jf| such that 
the following holds: Let f(x) = sign (wo + Y^ieH w i x i + YliicT w i x i) ^ e a threshold function 
over head indices H and tail indices T, where H = {i : \f(i)\ > ^(e) 2 } and T satisfies 
J2ieT w i = 1 ■ Then either: 

(i) f is 0(e)-close to a junta over H; or, 

(ii) f is 0{e)-close to the threshold function f'{x) = sign (^w + J2ieH w i x i + J2ieT 
where o~t denotes \J'Yl li&T f(i) 2 - Moreover, in this case we have a T = Q(e 2 )E 

Note that ^2 ieT (f(i)/o-T) 2 = 1, and since o~t = ^(e 2 ), for each ieTwe have 

\m/a T \ < K{ef/a T < 0(e 288 )/fi(e 2 ) = 0(e 286 ). (3) 

This means that for any restriction p fixing the variables in H, the function f'\ p is poly(e)- 
regular; this is important since it will allow us to apply the results of Section 12.21 to these 
restrictions. 

2.4 Proof of Theorem ffl. 

Now we are ready to prove Theorem [TJ We first show that every threshold function / is 
0(e)-approximated by a (1 + Inf(/) 2 ) • poly(l/e)-junta threshold function, and then argue 
that this yields Theorem [H For brevity, in the rest of this subsection we write I for Inf(/). 

Let < e < | be given and let / be any n-variable threshold function. W.l.o.g. we may 
consider a representation f(x) = sign(w + J2i=i w i x d 1X1 which each ^ 0, and by scaling 
the weights we may further assume that T = [n]\H has J2ieT w i = 1- 

We apply Theorem[TT]to /. Parseval's identity implies that at most l/«;(e) 4 many indices 
i can have \f(i)\ > ^(e) 2 , so we have \H\ < l/fi;(e) 4 = poly(l/e). In Case (i) we immediately 
have that / is 0(e)-close to a poly(l/e)-junta, so we suppose that Case (ii) holds, and 
henceforth argue about the 0(e)-approximator /' defined in Case (ii). 

We consider all 2 poly ( 1//e ' ) restrictions p obtained by fixing the head variables in H. Our goal 
is to apply the results of Section [2721 to the functions f'\ p . As noted in Section |2~3| for each 
restriction p the resulting function f'\ p over the tail variables in T is a r(e)-regular threshold 
function, where r(e) = 0(e 286 ) is the function implicit in the RHS of ([3]) (for brevity we 
henceforth write r for r(e)). Moreover, all these restrictions are threshold functions defined 

3 See the discussion immediately before Equation (24) of JOS08J; our k(e) is the r(e) of OS08 . 
4 See Equation (24) of jOS08] , 
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by the same linear form over the variables in T: they only differ in their threshold values, 

i.e. the values 9 P = f wq + EieH w iPi- 

In keeping with the notation of Section 12. 2[ for each restriction p we write ho to denote 

f'\ p , i.e. h 9p (xT) = sign^ + J2ieT u i x i) wnere u % = ^~ an d %t = (^i)ieT- We observe that 

Hi = £ kl = — £ l/WI < — E \m\ < 1 ■ poiy(iA). (4) 

where the last inequality uses Infj(/) = |/(z)| and a T = f2(e 2 ). Recalling that N equals 
©GMIi ' 72 • hi(l/r)), we have that N is at most I 2 • poly(l/e). 

We consider a distribution T>" over threshold functions on { — 1, 1}™ defined as follows: a 
draw of g <— V" is obtained by drawing L <— V and setting g{x) = sign(u> + ^2 ieH WiXi + 
• L(xt))- For every outcome of g <— Z>", the function g depends on at most \H\ + N = 
(1 + I 2 )poly(l/e) many variables. 

It remains only to argue that some g drawn from T>" is is 0(e)-close to /'. Via the 
probabilistic method, to do this it suffices to show that E g ^x>"[P r xe{-i,i} n [g(x) ^ f '{%)]] = 
0(t) (recall that r Ce). We now do this using the results of Section |2T2"1 

Fix any assignment p to the variables in H. By Lemma [10] we have 

E g ^ v „ [Pr XT+ _ { _ 1)1} |T|[/'| p (xT) 9\pM\] < 5r. 
Averaging over all p, we get 

E g ^ v „ [Pr x ^ ltl}n [f{x) ^ g{x)}] < 5r 

which is the desired bound. 

So, we have shown that every threshold function / is 0(e)-close to a (1 + 1 2 ) • poly(l/e)- 
junta; we finish the proof of Theorem [T]by arguing that this implies a I 2 •poly(l/e) junta size 
bound. Let c be an absolute constant such that every / is e-close to a (1 +I 2 ) • (l/e) c -junta; 
we consider different cases based on the size of I. If I > 1, then it is clear that (l + I 2 )(l/e) c < 
2I 2 (l/e) c < I 2 (l/e) c+1 (using e < 1/2). If I < 6 2 , since Z\s\>i f($) 2 < £|S|>i \S\f(S) 2 = 1 
(see Section I2.1.2p . by Parseval's identity we get that |/(0)| > 1 — e. This means that / 
is e-close to a constant function, which is of course a 0-junta. Finally, if e 2 < I < 1, then 
1 + 1 2 < 2 < 2I 2 e~ 4 < II 2 e~ 5 , so / can be e-approximated by a I 2 (l/e) c+5 -junta. So in every 
case / is e-close to an Inf(/) 2 • (l/e) c+5 -junta, and Theorem [1] is proved. □ 

2.5 Discussion and Conjectures. 

2.5.1 Improved low- weight approximators of threshold functions. 

Recall the main result of [Ser07] : 

Theorem 12. \Ser07^ Every n-variable threshold function f is e-approximated by a threshold 
function g = sign(u> • x — 6) with wi, . . . , w n all integers satisfying Yli=i w i< n ' 2°^ l ^ 2 \ 
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While a linear dependence on n is the best possible bound which can hold uniformly for 
all n-variable threshold functions, it is possible to give a sharper bound that depends on 
/. Applying Theorem [12] to the threshold function junta which is given by Theorem [TJ we 
obtain: 

Corollary 13. Every n-variable threshold function f is e- approximated by a threshold func- 
tion g = sign(w • x — 9) with wi, . . . ,w n all integers satisfying Ym=i w t — m ^ (f) 2 ' 2°( 1 / e \ 

Since Inf (f) 2 is at most n (but can be much less) for every threshold function /, this strength- 
ens Theorem IT2l 

2.5.2 A lower bound. 

We observe that the Inf(/) 2 • poly(l/e) upper bound of Theorem [1] is nearly best possible: 
no strengthening can replace this with a bound smaller than f2(Inf(/) 2 + 1/e 2 ). 

For the Q(lnf(f) 2 ) term, straightforward probability arguments (see e.g. [Ser07] ) show 
that any 1/10-approximator for the majority function sign(xi + - • -+x n ) must depend on fl(n) 
variables. Since the total influence of majority is 0(- v /n), this shows that no subquadratic 
dependence on Inf(/) is possible. 

For the Q(l/e 2 ) term, we use the following: 

Proposition 14. There is a threshold function f with Inf(/) = 0(1) such that any e- 
approximator g must depend on f2(l/e 2 ) variables. 

Proof. Let a = log(l/e) — 5 and b = 1/e 2 . The desired / is 

f(x) = sign(xi H h x a + ^ x a+i H h ^ x ^+b ~ a). 

It can be verified that Infj(/) = 6(e) for i G [a] and Infj(/) = 0(e 2 ) for i G [a + 1,6], so 
Inf (/) = 0(1). Any e-approximator for / must be a 1/16-approximator of the subfunction f\ p 
obtained by setting all the first a bits to 1. But f\ p is the majority function over 6 variables, 
and as mentioned above any 1/16-approximator must depend on Q(b) variables. □ 

2.5.3 Extending to degree-cf? 

It is natural to wonder whether Theorem [1] extends to polynomial threshold functions (PTFs) 
of degree d, i.e. Boolean functions f(x) = sign(p(x)) where p is a degree-c? polynomial. We 
pose the following conjecture which is a broad generalization of Theorem [TJ 

Conjecture 1. Every degree-d PTF f is e- approximated by a (Inf (/)/e)°^ -junta. 

We suspect that even the d = 2 case of Conjecture [TJ may be challenging, as the total 
influence of low-degree polynomial threshold functions does not seem to be well understood. 
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2.5.4 An exponential sharpening of Bourgain's theorem? 

Recall that by Parseval's identity, every Boolean function / has J2sc[ n ] f(S) 2 = 1- Since 
the total influence Inf(/) equals J2 S f(S) 2 \S\ and the degree of each monomial xs is \S\, we 
may interpret Inf(/) as the "average" Fourier degree of /. 

With this point of view, Friedgut's theorem may be viewed as part of a sequence of three 
results, all of which essentially say that Boolean functions with low degree (in some sense) 
are close (in some sense) to juntas. The first and earliest of these results is the following 
theorem of Nisan and Szegedy: 

Theorem 15. [NS941 Every Boolean function with (maximum) Fourier degree k is a k2 h - 
junta. 

This theorem imposes a strong degree condition on / - that it have zero Fourier weight above 
degree k - and gets a strong conclusion, that / is identical to a fc2 fc -junta. Next, Friedgut's 
theorem |Fri98j relaxed both the degree condition on / and the resulting conclusion: if the 
"average" Fourier degree of / (i.e. Inf(/)) is at most k, then / is e-close to a 2°^ c / e ' ) -junta. 
Finally and most recently, Bourgain relaxed the degree condition even further, by showing 
that if / puts most of its Fourier weight on low-degree monomials, then regardless of where 
the remaining Fourier weight lies, / must be close to a junta: 

Theorem 16. IBouOStf Every Boolean function f with^2^ s ^ >k f(S) 2 < (e/k) 1 ' 2 ^ 1 ' is e-close 
to a 2°( fc ) • poly(l/e) -junta. 

Let us consider how each junta size bound changes when we restrict our attention to 
threshold functions in the above theorems. We first observe that the [NS94J bound can be 
exponentially improved in this case: 

Proposition 17. Every threshold function with (maximum) Fourier degree k is a (2k — 1)- 
junta. 

(This follows from the easy fact that any threshold function with r relevant variables contains 
a subfunction which is an (£i=)-way AND or OR.) Our Theorem [TJ of course, tells us that 
Friedgut's theorem can also be exponentially sharpened if / is a threshold function. This 
motivates the natural question of whether Bourgain's theorem can be similarly sharpened 
for threshold functions. We state the following: 

Conjecture 2. Every threshold function f with Y2\s\>k f \S) 2 — (e/k) 1 / 2+ °( 1 ' is e-close to a 
poly (k/e) -junta. 



3 Theorem Ek approximating threshold functions to 
higher accuracy. 



As outlined in Section ll.2[ our new approach can be conceptually broken into the following 
steps: 
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1. Show that every threshold function has a representation in which many weights are 
"nice". 

2. Use the "niceness" of the weights to establish anti-concentration of w ■ x. 

3. Finally, use the anti-concentration of w-x to obtain an approximator with small integer 
weights. 

Note that there is a delicate relationship between the first two steps: the structural result 
for the weights that is established in the first step must match the necessary conditions for 
anti-concentration in the second step. The third step is a simple generic lemma translating 
anti-concentration into low-weight approximation. 

The structure of this section is as follows: In Section 13.11 we recall the anti-concentration 
results that we need to implement Step 2 in our above proof template and prove the simple 
lemma that implements Step 3 in our proof template. In Section 13.21 we give a "warmup" 
to our main result by using the template to give a clean and modular proof of the main 
result of [Ser 07j. In Section [3731 we show how the template yields a variant of Theorem [2] 
which has an n ' 1 ^ 2 ' 3 ' bound. This subsection includes the main new technical contribution 
of Section [31 a new result on representations of threshold functions, Lemma [26j Roughly 
speaking, this lemma says that every threshold function has a representation such that many 
of the differences between consecutive weights are not too small. Then in Section 13.41 we 
show how this n ^ 1 ^ 213 ^ bound can be improved to fully prove Theorem [2j 

Finally, all the results of this section can be appropriately generalized to constant-biased 
product distributions and i^-wise independent distributions (but as we show, they provably 
cannot be generalized to every distribution). We give these extensions in Section [U 

3.1 Anti-concentration of weighted sums of Bernoulli random vari- 
ables. 

We start with the formal definition of anti-concentration: 

Definition 18. Let a G W 1 be a weight-vector and r G R + . The Levy anti-concentration 
function of a is defined as 

p r (a) = snpPr x ^ u [\a - x — v\ < r). 

Thus, the anti-concentration of a weight vector a is an upper bound on the probability 
that a ■ x lies in any small interval (of length 2r). An early and important result on anti- 
concentration was given by Erdos [Erd45j; improving on an earlier result of Littlewood and 
Offord [LQ43] . he proved 

Theorem 19 (Erdos). Let a = (ai, . . . , a k ) G R k , r G 1R + be such that \a^\ > r for all i G [k]. 
Thenp r (a) < ( fe * 2 )/2 fc = 0{k~ x l 2 ). 

A large body of subsequent work generalized this result in many different ways (see 
e.g. Chapter 7 of [TV06j ): anti-concentration results of this general flavor have come to be 
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known as "Littlewood-Offord theorems." We shall require an extension of Theorem [191 which 
is due to Halasz |Hal77] . improving upon Erdos-Moser |Erd65j and Sarkozy-Szemeredi [SS65J. 
While Erdos's theorem gives the best (smallest) possible anti-concentration bound assuming 
that each weight cij is large, Halasz's theorem gives a stronger bound under the stronger 
assumption that the difference between any two weights is large: 

Theorem 20 (Halasz). Let a = (a\, . . . , a^) G M, k , r G M + be such that — aj\ > r for all 
i^je[k}. Then p r (a) < 0(Ar 3 / 2 ). 

Looking ahead, we note that the "3/2" exponent instead of "1/2" is the key to our 
improvement from 2 ( ~ >< - 1 / e ^ to 2°( 1 / e2/3 \ 

The last fact about anti-concentration that we shall need is the following simple lemma, 
which says that if we extend a weight vector a by adding more weights, its anti-concentration 
can only improve: 

Lemma 21 (Extension). Let a G M fe be any k- dimensional weight vector and r G M + be any 
non-negative real. For any n > k, let a' G M. n be the vector (a 1; . . . , a^, a' k+1 , . . . , a' n ) where 
the weights a' k+1 , . . . , a' n may be any real numbers. Then we have p r (a') < p T (a). 

The proof is by a simple averaging argument, using the fact that for x <— { — 1,1}™ 
uniform random, conditioned on any outcome of the variables Xk+i, . . . ,x n , the distribution 
of Xi, . . . ,Xk is still uniform. 

Prom anti-concentration to a low-weight approximator. The following simple lemma 
takes us from anti-concentration to a low-weight approximator. We use it to implement Step 
3 in our proof template. 

Lemma 22. Let g = sign(^ n =1 W{Xi — 9) be any threshold function. If p r (w\, . . . , w n ) < e, 
then there exists a 2e- approximator h for g, where h is a threshold function with integer 
weights each of magnitude 0(maxj |iUj| • yjn ln(l/e)/r). 

Proof. Let a = r/(^n ln(2/e)). For each % G [n], let ui be the value obtained by round- 
ing Wi to the nearest integer multiple of a and Vi = Ui/a G Z. We claim that h(x) = 
sign(^ n =1 ViXi — 9/a) is the desired approximator. It is clear that maxj \vi\ = 0(maxj \wi\/a), 
so it suffices to show that h is (e + e)-close to g. 

For i G [n], let = Wi — Ui, so that u ■ x = w ■ x — e- x. We have that g(x) ^ h(x) only if 
\e ■ x\ > r or \w ■ x — 9\ < r. We bound from above the probability of each of these events by 
e. The probability of the second event is bounded by e since Pr[|w ■ x — 9\ < r] < p r (w) < e. 
For the first event we have Pr[|e • x\ > r] < Pr[|e • x\ > ||e|| 2 y2 ln(2/e)] < e, where the first 
inequality uses the fact ||e||2 < (r/ a/2 ln(2/e)) and the second follows from the Hoeffding 
bound. □ 



3.2 Warmup: Simple Proof of |Ser07j Main Result. 

In this section we give a simple and modular proof of nearly the same bound as the main 
result of [Ser07| . following the proof template from the start of Section [3j Let / : { — 1, l} n — > 
{ — 1,1} be any threshold function. 
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First step: This is provided for us by the following result, which is an immediate conse- 
quence of Lemma 14 in [OS08] . Intuitively, this result says that every threshold function has 
a representation in which the k-th largest weight is not too small compared with the largest 
weight^ 

Claim 23. Let f : { — 1,1}" — > {—1,1} be any threshold function, let e > 0, and let k G 
[n]. There is an e- approximator g = sign(^]" =1 WiXi — 9) for f with the following property: 
Suppose (reordering and rescaling weights if necessary) that 1 = > • • • > |u> n |. Then 
K| > l/(fcV3nln(2/e)). 

Second step: We apply Erdos's theorem, Theorem [T2J to the weight vector (w\, . . . ,Wk) 
from Claim [221 (we will fix k later), taking r = l/(k k ^3n ln(2/e)) to be the bound from 
Claim l23l Theorem [T9l gives p r (wi, . . . } Wk) < 0(l/\/k), and the Extension Lemma I2T1 gives 
that in fact p r (wi, . . . , w n ) < 0(l/\/k). 

Third step: It remains only to fix k = min{l/e 2 , n} and observe that the h obtained from 
Lemma [221 is an 0(e)-approximator for /. (Note that if 1/e 2 > n, then integer weights 
2°( 1 / <e2 ) suffice to exactly represent / by |MTT6lj .) We have thus proved: 

Theorem 24. Every n-variable threshold function f is e- approximated by a threshold func- 
tion h = sign(u • x — 6) with v%, . . . , v n all integers of magnitude n ■ 2°( 1 / e \ 

This is almost identical to the main result of |Ser07] ; the bound of |Ser07] is slightly 
stronger (it has \fn in place of n). 

3.3 Toward Theorem Wt An n°^ ,eV ^ bound. 

In this section we prove an intermediate result towards our ultimate goal of poly(n) ■2 < ^ 1//e2/3 ): 

Theorem 25. Every n-variable threshold function f is e- approximated by a threshold func- 
tion h = sign(t> ■ x — 6) with Vi, . . . , v n all integers of magnitude n ^ 1 ^ 2 ^^. 

We follow the same high-level proof template as the previous section. Let / : {— 1, l} n — ► 
{— 1, 1} be a threshold function. We may assume w.l.o.g. that / depends on all n input 
variables, and since the claimed bound follows again from [MTT6lj if l/e 2//3 > n — 2, we 
assume 1/e 2 / 3 < n — 2. 

First step: Our goal now is to apply Halasz's anti-concentration bound in Step 2 rather 
than Erdos's theorem. To do this we need the following new result on representing threshold 
functions, which intuitively says that every threshold function has a representation using 
weights such that many of the differences between consecutive weights are not too small 
compared to the largest weight: 

5 We do not repeat the proof of Claim [521 or Lemma 14 from |OS08j here but we note that the proof is 
self-contained and rather straightforward; it follows along the lines of [MTT61 's classic argument to upper 
bound the weights required to represent any threshold function. 
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Lemma 26. Let f : {— 1, l} n — > { — 1,1} be a threshold function that depends on all n 
variables. There is a representation sign(^™ =1 WiXi — 6) for f with the following property: 
Suppose (reordering and rescaling weights if necessary) that 1 = |wi| > ••• > \w n \ > 0. For 

i G [n — 1] let Aj = \w,-\ — Then for any k G [n — 2], the k-th biggest element of the 

(multiset) A x , . . . , A n _i is at least (2w+ ^ )2fc+8 . 

We pause to contrast this result with an earlier theorem due to Hastad [ Has05j that 
appeared in [Ser07j . Under the same hypotheses as Lemma 1261 the earlier theorem asserted 
that for any k G [n] the fc-th largest weight Wk satisfies \wk\ > k ^+i) . The proof of the earlier 
theorem centers on a careful analysis of a linear program in which the variables are the weights 
Wi, . . . , w n and there are 2™ constraints corresponding to the 2™ points x G { — 1, l} n . To 
prove Lemma [26j we must now analyze a linear program with some additional constraints 
which, intuitively, ensure that there are "gaps" between the weig htsH We prove Lemma [26] 
at the end of this subsection. 

Second step: We take k = 1/e 2 / 3 and consider the k largest differences A ix = — 
. . . , A ife = \w ik \ — |iOj fc+ i|. Lemma I2B1 implies that for all a ^ b G [k] we have \w ia — 
w*J ^ r i f° r r — l/(2n + 2) 2k+8 . Applying Halasz's anti-concentration bound, Theorem |2"U| 
we get that p r (wi 1 , . . . , W{ k ) < 0(k~ 3 / 2 ) = 0(e), and the Extension Lemma [211 further gives 
p r (wi, ...,w n ) = 0(e). 

Third step:. We simply apply Lemma [221 Recalling that r = l/(2n + 2) 6(1//<e2/3 ), the proof 
of Theorem I251 is complete (modulo the proof of Lemma 1261) . □ 

Proof of LemmalMi Let f(x) be a threshold function. We first consider the case that / 
is odd, i.e. f(x) = —f(—x) for all x G {—1, l} n ; in this case / can be represented with a 
threshold of zero. Once we have established the result for such threshold functions we will 
use it to establish the general case. 

By symmetry of { — 1, l} n we may assume that / is monotone increasing in each coordinate 
Xj. By reordering coordinates we may assume that Inf 1 (/) > Inf 2 (/) > • • • > Inf n (/) > 
(the final inequality is strict because / depends on all n coordinates). 

We consider the set W C M™ of weight vectors w = (wi, . . . , w n ) that satisfy the following 
properties: 

1. w ■ x > 1 for every x G {—1, 1}™ such that f(x) = 1. Note that since / is odd these 
inequalities imply the corresponding inequalities for negative points, w ■ x < —1 for 
every x G { — 1, 1}™ such that f(x) = —1. 

2. Wi — w i+ i > 1 for all i — 1, 2, . . . , n — 1, and w n > 1. 

The first set of 2 n_1 constraints says that sign(w • x) is a valid representation for / (i.e. 
f(x) = sign(w • x) for all x G { — 1, l} n )- The second set of n constraints says that no two 
weights are precisely the same and moreover all the weights are positive. (These are the new 
constraints that did not feature in the proof of |Has05| .) 

6 In fact, by considering the majority function one can verify that the 2"-constraint linear program of the 
earlier proof is not sufficient; that LP yields a representation in which each Wi is the same and hence the 
"gaps" Aj are all 0. 
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Thus W is the feasible set of a linear program CV consisting of 2 n ~ 1 + n inequalities on 
Wi, . . . , w n : 2 n_1 inequalities correspond to points of the hypercube {—1, l} n and n inequal- 
ities correspond to the set 

D n = {(1, -1, 0, ... , 0) lxn , (0, 1, -1,0,..., 0) lxn , . . . , (0, . . . , 1, -l)ixn, (0, • • • , 0, l) lxn }. 

We claim that the linear program CV is feasible, or equivalently W 7^ 0. Indeed, by 
simple standard arguments it can be shown that every odd threshold function / : {—1, l} n — > 
{ — 1, 1} has a representation sign(u>-x) such that (i) for all x G {—1, l} n , it holds sign(w-x) 7^ 
0, and (ii) every partial sum of the weights is distinct, i.e. for all I 7^ J C [n] it holds 
Yuiei w i 7^ J2ie.j w j- The latter in particular implies that W\ 7^ w 2 7^ . . . 7^ w n . Now 
recall that Inf^/) > Inf 2 (/) > . . . > Inf n (/) > and that / is monotone increasing in all its 
coordinates. It is well known and easy to show (see e.g. [FP04] ) that there is a representation 
sign(u> • x) of such a threshold function that satisfies W\ > W2 > . . . w n > 0. Therefore, we 
can scale the weights so that all the constraints in the linear program CV are simultaneously 
satisfied. 

Having established that W 7^ 0, we select a weight vector w* G W that maximizes the 
number of tight inequalities (i.e. satisfied with equality) in CV. If more than one weight 
vector satisfies a maximum number of tight inequalities, we choose one arbitrarily. At this 
point, we invoke the following crucial claim: 

Claim 27. There exists a set of n points y^ l \ . . . G / _1 (1) U D n such that w* is the 
unique solution of the linear system: {w ■ yW = 1 | z = 1,2,... ,n}. (Henceforth, we shall 
denote this system by (*).) 

The proof of the claim is essentially the same as in the proof of Muroga et aVs |MTT61j 
classic upper bound on the size of integer weights that are required to express LTF's over 
{ — 1, l} n . For completeness we include a proof of the claim in Appendix [5j 

Note that (*) is a system of n linear equations in the variables w%, . . . ,w n where each 
coefficient of each variable in the equations is —1,0 or 1 and the right-hand side of each 
equation is 1. Since our goal is to prove a statement about the magnitude of the differences 
Wi — w i+ i, i — 1, 2, . . . , n — 1, we define an appropriate set of n new variables and rewrite 
(*). In particular, we define the set of variables S\, . . . ,5 n as follows: 

5 n = w n , 5i = Wi- w i+1 for i = 1, . . . , n - 1. 

This is equivalent to 

w n = 5 n , Wi = Si-\ V 5 n for % = 1, . . . , n - 1. 

We let S denote [Si, . . . ,5 n ]. By rewriting (*), we get an equivalent system (**) of n 
equations in variables S\, . . . ,8 n where the coefficients of each variable in each equation are 
integers in the range [— n, n] and all the right-hand sides remain 1. Hence, the linear system 
(**) has the unique strictly positive solution 

K = w* n , 5* = w* - w* +1 for i = 1, . . . , n - 1. 
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At this point we reorder the variables 8{ in decreasing order of magnitude of the 5*'s. We 
thus get a new set of variables T\, . . . , r n such that 

r* = 2-th largest of {5^, . . . , 5*}, 

breaking ties arbitrarily. We similarly denote t — [t% , . . . , r n ] . 

So (**) is now a system of n equations in variables {Tj}i e r n j, where the coefficients of each 
variable in each equation are integers in the range [— n, n] and all the right-hand sides are 
still 1. The values r*, . . . , r* in the unique solution of this system are strictly positive and 
ordered in decreasing order of magnitude. Let us write 

OjlTl + a j2 T 2 + . . . + 0(j n T n = 1 

for the j-th equation where a^, i,j G [n] are integers in [—n, n}. It is not difficult to see that 
the above system is equivalent to the following system of n equations in n, . . . , T n : 

a j iT 1 + a j2 T 2 + ■ ■ . + a jn j n = a 1 iT 1 + a l2 T 2 + . . . + oti n T n for j = 2, 3, ...,n, and r n = r*. 

Each of the first n — 1 equations is homogeneous and can be rewritten as r • z^> = 0, where 
z^' is a vector whose entries are integers in [— 2n, 2n\. So we have that r* = [r*, . . . , r*] is 
the unique solution to a linear system: 

Zt = b (5) 

where Z is a non-singular n x n matrix with entries that are integers in [— 2n, 2n] and with 
last row (0, . . . , 0, 1), and b is [0, 0, ... , 0, r*]. 

Recall that r* > • ■ ■ > r* > 0. We now show that each is somewhat large compared 
to u>*. The case k — 1 is easy: since ^ILi r «* = w i' we nave r i — w*/n. 

Fix any fe 6 {2, . . . , n}. After possibly reordering the rows of Z, the (A; — l)-dimensional 
vector [1,0,..., 0] can be expressed as a linear combination aiRi + • • • + a^-iRk-i where 
Ri is the z-th row of the (k — 1) x {k — 1) upper left submatrix of Z. Since all entries in 
Z are integers in [— 2n, 2n], Cramer's Rule implies that each \cii\ is at most the maximum 
determinant of any (k — 1) x (k — 1) matrix with all entries in [— 2n, 2n]; this is easily seen 
to be at most (k — l)!(2ra) fc_1 . It follows that there is a linear combination of the first k — 1 
equations of (jSJ) which yields 

n = X>h (6) 

j=k 

where each |t^| is at most (k — 1) • (2n) • (A; — l)!(2n) fc_1 < (2(k — l)n) k . From ([6]), setting 
r = r* and recalling that the r*'s are positive and ordered by magnitude, we now get 
r* < (n — + l)(maxj |)t^ which implies 

Tfc - (2(A;-l)n) fc (n-A; + l) ~ (2n) 2fc + 1 
Observing that J^ILi ^ = w i) we have r* > w*/n, which implies 



w 



Tk ~ (2n) 2fe + 2 ' 
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Finally, we observe that for k G [n — 1], the k-th biggest element of the multiset 
Ai, . . . , A n _i (see the lemma statement) is at least r k+1 . (It is either r fc+1 or r k depending 
on whether or not 5* = w* is among the k largest elements of {5*, . . . , 5*}.) Renormalizing 
so that the largest weight is 1, we have shown that for odd /, the k-th biggest element of 
the multiset Ai, . . . , A n _i is at least ^ 2n ) 2k + 4 • ^ ms com pl e tes the proof of Lemma I2"o1 for the 
case that / is odd. 

We now treat the case where / is not odd, i.e. / has a nonzero threshold. We do this 
by considering the threshold function g : { — 1, l} n+1 — > { — 1,1} which has zero threshold, n 
weights the same as /, and an (n + l)-st weight which is the threshold of /. The result for 
the zero-threshold case shows that g has a representation sign(^iXi + ■ • • + w n x n + w n+ ix n+ i) 
where \w±\ > ■ ■ ■ > \w n+ i\, and letting Aj = \wi\ — \ wi+i\ for i G [n], the k-th biggest element 
of Ai, . . . , A n is at least ( 2 n+2)^+ 4 ^ or an y ^ e M- 

We now observe that for k G [n — 2], the k-th biggest gap between the magnitudes of 
the WiS that correspond to actual weights of / is at least the (k + 2)-th biggest element of 
Ai, . . . , A n . This holds since at most two of the values Aj = \wj\ — \wj+i\ can involve the 
weight Wj* which corresponds to the threshold of /, as opposed to one of its actual weights. 
Since |w?x| is at least as large as the absolute value of the largest actual weight of /, we get 
that for k G [n — 2] , the k-th biggest gap between the magnitudes of the actual weights of / 
is at least (largest weight of f)/(2n + 2) 2fc+8 . Renormalizing so that the largest magnitude 
weight of / is 1, Lemma [261 is proved. □ 

3.4 Proof of Theorem Ek A poly(n) • 2 d ^'^ bound. 

Given a threshold function f(x) = sign(u>-x — 6) such that \w\\ > • ■ ■ > \w n \ > 0, for k G [n] 
we denote by a k the quantity yjY^Ji=k w f- ^ ne analysis m !Ser07] is based on the notion of 
the "r-critical index" : 

Definition 28. We define the r-critical index £(t) of a threshold function / = sign(w-x — 6) 
as the smallest index i G [n] for which \wi\ < t ■ o~{. If this inequality does not hold for any 
i G [n], we define £(r) = oo. 

We now show how to use Theorem [25] and ideas from |Ser07] to prove Theorem [2j Given 
e > 0, we proceed by a case analysis, as in [Ser07] . based on the value of the e-critical index 

£ = f £(e). If £ > L = f 0(l/e 2 ), Case Ha in |Ser07j says that / is e-close to the L-junta g 
obtained by truncating the smallest (n — L) weights, i.e. g{x) = sign(J2i=i w i x i ~ By 
applying Theorem [25] to g, we obtain an e-approximator h with integer weights of magnitude 

L 0(l/eV3) = ^(l/^)^ which 

is a 2e-approximator for /. It remains to handle the case £ < L. 
To do this, we use another fact from |Ser07] ; that, for every value of £-, there exists an e- 

approximator for / with integer weights of mag nitude ^Jn ln(l/e) ■ 2°^ l \ If £ < K = 
2/e 2 / 3 , this yields an e-approximator with integer weights of mag nitude ■ 2°^^ and 
we are done. To handle the case K < £ < L, we use a combination of Gaussian anti- 
concentration (for the n — I + 1 smallest weights) and "Halasz-type" anti-concentration (for 
the largest £ — 1 weights). 

Let us proceed with the analysis. We start by rounding the weights We, . . . ,w n , exactly 
as in Case lib in |Ser07| . to get an e-approximator g{x) = sign(J^" =1 V{X{ — Q') for / with 
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the following properties: (i) For i > £, each Vi is an integer of magnitude 0(^n ln(l/e)) and 
Yl"i=e v i = 0(nln(l/e)/e 2 ); (ii) It holds |i>i| > [^2 1 > ••• > \ v e-i\ > 1- Our goal is to establish 
the existence of an e-approximation h for g with small integer weights. To achieve this, we 
will use the fact that the "tail" of g has small integer coefficients, i.e. the integer-valued 

random variable t(x) = Yli=e v i x i has small support. 

Let R, k > be integers. Denote by Q(R, k) the set {il}^ 1 x {-R, -R+l, ...,R-1, R}. 

Now fix an integer R = 0(y / nln(l/e)/e) and denote Q = f Q(Rq, £). Consider the threshold 
function h : fi {±1} defined by /i(y) = sign(J2iZi v iUi + Ve — 9'), y G fio- We claim 
that the threshold function g' : { — 1, 1}™ — > { — 1, 1} defined by g'(x) = h(xi, . . . ,Xf-i,t(x)) 
is e-close to g. To see this note that g'(x) equals g(x) whenever \t(x)\ = \ Y^i=e v i x i\ — Ro, 
and this holds for a random x with probability 1 — e by a Hoeffding bound (since R > 
a/2 ln(2/e) YM=e v i by the definition of Ro and property (i) of g) . 

At this point we use the following technical generalization of Lemma [261 whose proof is 
deferred to the end of this subsection: 

Lemma 29. Let h' : Q(R, k) — > {±1} be a threshold function that depends on all k variables. 
Suppose that h'(y) has a representation as sign(^^ =1 w\yi — 9') such that \w[\ > \w' 2 \ > . . . > 
\w' k \ > 0. There exists an alternate representation of h! as sign(J^ =1 u-iy-i — 9") satisfying 

def 

1 = \ u i\ > • ■ ■ > \ u k\ > 0, with the following property: For i G [k — 1] let A, = — 
Then for any j 6 [A; — 2], the j-th biggest element of the (multiset) A 1; . . . , Ak_i is at least 



(2fc+2_R)-(2fc+2) 2 J'+ 8 - 

Applying this lemma to h, i.e. setting h' = h, R = R$ and k = £, and fixing j = 1/ e 2//3 + 2 < 
K — 2 < ^ — 2, we obtain a representation sign(J^ =1 Ujy^ — 9") for ft, such that the j 
largest differences A ia = jujj — . . . , Aj. = \uu\ — are at least r , for r = 

(2i+2_Ro)-(2i+2) 2 J+ 8 = (VV^) ' 2 -c ^ 1 / e2/3 ). (Note that the latter equality uses the fact that 
£ < L.) This yields a set of j' = 1/e 2 / 3 weights u^, . . . ,ui., - not including -u^ - whose 
absolute differences are at least r , i.e. for all a 7^ b G [j'], we have |wj a — ttzj > r . 

We are now ready to use our proof template again. The alternate representation for h 
from above and the definition of g' imply that g'(x) can be represented as sign(J^~J UiXi + 

Y^i=e, u 'i x i ~ where u\ = f ugVi, £ < i < n. By Halasz's bound, Theorem [201 ap- 
plied to the weights U\ x , . . . , ui., , and the Extension Lemma EH as before, we conclude that 
p ro {u>i, ■ ■ ■ = 0(e). Finally, since the maximum weight in (the new representation for) 
g' is 0(^n log(l/e)) (as follows from the fact that |uj| < 1, % G [£], and property (i) of g), 
Lemma [221 implies the existence of an 0(e)-approximator for g' with integer weights each at 
most n 3 / 2 ■ 2°( 1 / e2 This concludes the proof of Theorem [21 □ 

Proof of Lemma \29\ The proof is a technical extension of Lemma [261 taking into account 
the fact that the last variable of h' has a non-boolean range. We consider the same linear 
program as in Lemma EH] and the analysis extends essentially by following the previous proof 
line-by-line. We therefore omit some details. 

We similarly start with the case that hi can be represented with 9' = 0. Once we 
have established the result for such threshold functions the general case follows easily. By 
symmetry of Q(R, k) we may assume that h! is monotone increasing in each coordinate 
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The linear program CV is the set W C R k of weight vectors w = (w\, . . . ,Wf.) with the 
following properties: 

1. w ■ y > 1 for every y G fi(i?, k) such that = 1. 

2. W7j — > 1 for all i — 1, 2, . . . , k — 1, and > 1. 

Now the £"P consists of (2i2 + 1) ■ 2 fe-1 + k inequalities: (2R + 1) ■ 2 fc_1 inequalities 
correspond to points of Q(R, k) and k inequalities correspond to the set 

D k = {(1, -1, 0, ... , 0) lxfcj (0, 1, -1, 0, ... , 0)i xfc , . . . , (0, . . . , 1, -l)ixfc, (0, ... , 0, l)ixfc}- 

We claim that the linear program CV is feasible. This follows from the fact that hi is 
assumed to have a representation as sign(J^ =1 w[yi — 9') such that \w[\ > \w' 2 \ > ... > 
\w' k \ > 0. (Even if some of the w"s are equal, we can slightly perturb them without changing 
the function. Then, we can scale everything up if necessary and obtain a feasible solution 
to the CV.) Given that W 7^ 0, we similarly select a weight vector w* G W that maximizes 
the number of tight inequalities in CV (breaking ties arbitrarily). We invoke the following 
claim which is proved virtually identical to Claim [ 



Claim 30. There exists a set of k points y^ l \ . . . ,y^ G (/i') _1 (l) U D k such that w* is the 
unique solution of the linear system: {w ■ y^> = 1 | i = 1, 2, . . . , k}. (Henceforth, we shall 
denote this system by (*).) 

Note that (*) is a system of k linear equations in the variables W\, ...,Wk where each 
coefficient of the variables Wi, . . . ,Wk-i in the equations is in { — 1,0, 1}, the coefficients of 
Wk are in {— R, —R + 1, R — l,R} and the right-hand side of each equation is 1. Now, 
following the analysis of Lemma [26] line-by-line, we obtain the final system Zt = b, where Z 
is a non-singular k x k matrix with entries that are integers in [—2k, 2k] with the exception 
of one column with entries in [—2k — 2R, 2k + 2R] and with last row (0, . . . , 0, 1); similarly, 
b equals [0, 0, ... , 0, r£]. 

This system has a unique solution r* = [r*, . . . ,r|], where rj* > • • ■ > r k > 0. We 
similarly show that each t* is somewhat large compared to w\. Fix any j G {2, ...,k}. 
After possibly reordering the rows of Z, the (j — l)-dimensional vector [1,0,..., 0] can be 
expressed as a linear combination a\R\ + ■ ■ • + dj-xRj-i where Ri is the z-th row of the 
(j — 1) x (j — 1) upper left submatrix of Z. Since all entries in Z are integers in [—2k, 2k], 
except for one column with entries in [—2k — 2R, 2k + 2R], Cramer's Rule implies that each 
I Oi| is at most (2k + 2R)(j — 1)!(2A;) J ' -1 . It follows that there is a linear combination of the 
first j — 1 equations of the final system, which yields T\ = 'Ylii=jli r ii wriere eac h is & t 
most (j - 1) • (2k) ■ (2k + 2R)(j - 1)!(2A;)^ 1 < (2k + 2R)(2(j - l)k) j . Setting r = r* in the 
equation above, we get rj" < (k — j + l)(maxj \)t* which implies 

r* r* 

T* > ^ > £ (8) 

J - (2k + 2R) ■ (2(j - l)k)i(k - j + 1) ~ (2k + 2R) ■ (2k) 2 ^ V ; 

Observing that Yli=i T t = w ii we h ave r i — w*/k, which gives 

* w i 
Tj ~ (2k + 2R) ■ (2k) 2 i+ 2 ' 
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Finally, we observe that for j £ 1], the j-th biggest element of the multiset Ai, . . . , Ak-i 
(see the lemma statement) is at least Tj + %. (It is either Tj +1 or Tj depending on whether or 
not 51 = wl is among the j largest elements of {51, ... ,5%}.) Renormalizing so that the 
largest weight is 1, we have shown that for odd h', the j-th biggest element of the multiset 
Ai, . . . , Afc_i is at least ( 2 k+2Ry(2k) 2 i+ i • ^ ms com pl e tes the proof for the case that h! is odd. 
The extension to general h! is identical to the argument of Lemma [261 This concludes the 
proof of Lemma [291 □ 



4 Extensions to other distributions 

Thus far the notion of approximation that we have dealt with has been approximation 
under the uniform distribution. In this section we show how our results on small-weight 
integer approximators can be extended to a fairly broad class of distributions which includes 
constant-biased product distributions and if-wise independent distributions. Our proofs in 
this more general setting follow the same new approach we have used throughout Section [3j 
of constructing a "nice" representation and then using anti-concentration. 

Formally, given a probability distribution T> on {—1,1}™, the distance between f,g : 

{ — 1,1}™ — > { — 1,1} with respect to V is defined as distx>(/, g) == PT x< -.v[f(x) ^ g(x)]. If 
distx>(/, g) < e, we say that / and g are e-close w.r.t. T> and that g is an e-approximator to 
/ (w.r.t. T>). We consider the following question: Given a threshold function / : { — 1, 1}™ — > 
{—1,1}, an error parameter e > and a distribution T>, does there exist an e-approximator 
g for / w.r.t. T> with small integer weights? 

In Section 14.11 we discuss anti-concentration under general distributions and record the 
anti-concentration inequalities that we will use. In Section 14.21 we generalize the basic 
poly(n) • 2 ( ^ 1 / e2 ) result of |Ser07] and in Section 14.31 we generalize the poly(n) • 2 < ^ 1//<e2/3 ) 
bound of Theorem [2j Because full proofs would be quite lengthy and occasionally repetitive 
in some cases we only provide sketches. 



4.1 Anti-concentration under general distributions. 

We start by defining the notion of anti-concentration for general measures on the hypercube 
{-1,1}™/ 

Definition 31. Fix a distribution T> on {—1, 1}™. Let a £ W 1 be a weight vector and r e R + . 
The Levy anti-concentration function of a w.r.t. T> is defined as 

p r (a,V) =^supPr x ^x>[|a ■ x — v\ <r). 

Let x G { — 1, 1}™ be drawn from T> and consider the random variable S = a-x = Yli=i a i x i 
where a £ R™. While it is clear that the random variable S can be very concentrated if T> 
is arbitrary, there are broad classes of interesting distributions for which it is possible to 
establish good anti-concentration under suitable assumptions for the weights. In particular, 
as we now describe, this is possible for constant-biased product distributions and if-wise 
independent distributions for large enough K. 
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Product Distributions. We start with the case of product distributions. Let pi G (0, 1), 

i G [n]. Let \i Pi be the distribution on the two point space { — 1,1} with /i Pi (l) = p%- We 
denote the corresponding product distribution by (^)" =1 jJ, Pi . For such a product distribution 
we denote 

p = minfe, 1 - pj} G (0, 1/2]. 

We henceforth write ©prod to denote a generic product distribution for which p = 0(1) (we 
call such distributions constant- biased product distributions), and we omit the dependence 
on p in our bounds. 

We mention two anti-concentration inequalities under product distributions. The follow- 
ing results intuitively say that, for any constant-bounded product distribution, the random 
variable S — a ■ x — YH=i a i x i nas g°°d anti-concentration if the weights have appropriate 
structure. 

The first such result is a generalization of Erdos's theorem, Theorem! 



Theorem 32. Let a = (ai, . . . , a^) G M fc , r G M+ fre siic/j |aj| > r /or a// i G [k]. Then 
Pr(a,T) piod ) < 0{k' 1/2 ). 

For the case that p%= p for all % G [A;], Theorem [321 can be proved in an elementary way 
using Sperner theory (similar to the proof of Theorem fT9l) . For the case of different p^s, a 
proof can be obtained using the Fourier analytic methods in [Hal77j (see [AGKW09] for an 
explicit reference). 

The second theorem is a generalization of Theorem | 



Theorem 33. Let a = (ai, . . . , a*;) G M fc , r G K + be such that |a, — Oj| > r /or a// z ^ / G [k]. 
Then p r (a,V piod ) < O^l 2 ). 

This theorem can also be obtained using the techniques of [Hal77] . 
Finally, the extension lemma will again be useful for us: 

Lemma 34 (Extension). Let a G M fc be any k- dimensional weight vector and r G R+ any 
non-negative real. For any n > k, let a' G M n be the vector (a\, . . . , a&, a' k+1 , . . . , a' n ) where 
the weights a' k+1 , . . . , a' n may be any real numbers. Then we have p r (a', T> prod ) < p r (a, T> prod ). 

As in the uniform distribution case, this follows directly from independence. 

i^-wise Independent Distributions. A distribution T> on { — 1, 1}™ is K-wise independent 

if the projection of T> onto any K indices is uniformly distributed over { — 1, 1} K . The 
class of .fT-wise independent distributions over {—1, 1}" is a broad and important class of 
distributions that has received much study (see |Wig94, BR94] and many other references) 



because of its usefulness in derandomization and other applications. 

We note that the extension lemma fails for .fT-wise independent distributions, since one 
cannot fix most of the bits and argue that the remaining bits are independent. For i^-wise 
independent distributions, we thus need to establish anti-concentration in a different way. 
This can be done using the recent result of |DGJ + 09j . 

We shall denote by £>Kwise a generic if- wise independent distribution on {—1,1}™. We 
recall the main result of |DGJ+09j : 
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Theorem 35 (Theorem 1.2 in |DGJ + 09] , rephrased). Let h{x) = sign(^" =1 WiXi — 8) be any 
threshold function. Then we have 

I Pr^v Kvi Jh(x) = 1] - Pv x ^ u [h(x) = 1]| < O 

As an immediate corollary we obtain: 
Fact 36. Let aeR n and r G R+. Then 

p T (a,V Kwise ) < p r (a,U) + O 

We require the above two anti-concentation probabilities to be e-close to each other, so we 
henceforth fix 

ir^ f e(l/e 2 -log 2 (l/e))=0(l/e 2 ). 

Fact [Ml Theorem [TU] and Lemma EH now yield: 

Theorem 37. Let a = (ai, . . . , a n ) G M n , r G M + and suppose that |tij| > r /or a// i E [4 
Taen 

Pr(a,PKwise) < O^" 1 / 2 ) + e . 

Similarly Fact Theorem [201 and Lemma [5T] together yield: 

Theorem 38. Let a = (ai, . . . , a n ) G M n ; r £ IR + and suppose that — | > r /or a// 
i/j'e[4 Then 

p r (a,V Kwisc ) < 0(r 3/2 ) +e. 

4.2 A poly(n) ■ 2°( 1 / e ^ bound for product distributions and i^-wise 
independent distributions. 

The following lemma translates anti-concentration to low-integer weight approximation for 
any distribution V. 

Lemma 39. Fix a distribution T> on { — 1, l} n . Let g = sign(w ■ x — 9) be any threshold 
function. If p r (w,T>) < e, then there exists an e- approximator h for g w.r.t. T>, where h is 
a threshold function with integer weights each of magnitude 0(maxj \ wi\ ■ n/r). 

We note that the bound on the magnitude of the weights is now linear in n (as opposed to y/n 
in Lemma [22]) and that no dependence on e appears in the bound. The proof is essentially 
the same as the proof of Lemma [221 but with the following small change: in Lemma 1221 we 
rounded to integer multiples of rj \J~n log(l/e) and used a Hoeffding bound to show that the 
probability that the error vector e has |e • x\ > r, for a uniformly random x G {— l,l} n , is 
upper bounded by e. Since the Hoeffding bound does not apply for general distributions, we 
now round to multiples of r/n. (This is what makes the dependence on n worse by a ^fn 
factor.) As a consequence, the corresponding error probability is now 0, since ||e||i < r. 

To obtain the desired poly(n) .2°( 1//e2 ) bound, we will use the following claim, that follows 
from Claim [23] by setting e = l/(2 n + 1). (The claim also follows as an immediate corollary 
of a theorem, due to Hastad [Has05] . that appears as Theorem 6.5 in |Ser07j .) 
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Claim 40. Let f : {— 1, l} n — > { — 1, 1} be any threshold function and let k G [n]. There is 
an exact representation for f as sign(^™ =1 WiXi — 6) with the following property: Suppose 
(reordering and rescaling weights if necessary) that 1 = \w\\ > ••• > \w n \. Then \wk\ > 
1/(4- k k -n). 

At this point we have the tools to obtain a weight bound of n 2 ■ 2°( 1 / <e2 ) for both product 
distributions and K-wise independent distributions. However, the following easily verified 
remarks can be used to improve the dependence on n to linear in both cases. 

1. For the class of product distributions, Lemma 1391 applies with the same quantitative 
bound as Lemma [221 (i.e. 0(maxj • ^Jn ln(l/e)/r)). The proof is essentially the 
same as that of Lemma 1221 since the Hoeffding bound only requires independence. 

2. For the class of i\-wise independent distributions, we can obtain the quantitative 
bound maxj \wi\ ■ (-y/n/r) ■ 0(l/e 2 ) in Lemma 1551 The proof is essentially the same as 
Lemmal22lbut using an appropriate tail inequality for i\-wise independent distributions 
(e.g. Lemma 2.3 in |BR94j ) instead of the Hoeffding bound. 

3. Claim [23] applies with the same quantitative bound for any constant-biased product 
distribution and with the bound l/(k k ■ y/n ■ 0(l/e 2 )) for any i^-wise independent 
distribution. For product distributions the proof remains essentially unchanged from 
the uniform distribution proof in (OS08j . again because the Hoeffding bound only 
requires independence. For i\"-wise independent distributions, again one uses a tail 
bound for K-wise independent distributions. 

Thus we have the main results of this subsection: 

Theorem 41. Let f be any n-variable threshold function. Then 

1. f is e- approximated with respect to T> prod by a threshold function g = sign(w ■ x — 6) 
with Wi, . . . , w n all integers of magnitude n ■ 2°( 1//e2 )/ and 

2. f is e- approximated with respect to V Kvrise by a threshold function g = sign(w ■ x — 6) 
with Wi, . . . , w n all integers of magnitude n ■ 2°^ l ^ 2 \ 

Proof. For part (1), we set i = min{l/e 2 , n}. We apply Theorem [321 to the weight vector 
(wi, . . . , wg) from Claim [23] (or more precisely from the variant described in remark 3 above) 
taking r = ■ 2~ 0( - 1 ^ 2 \ Theorem [321 gives p r ((wi, . . . ,wi),V prod ) < e, and Lemma [341 

gives p r (w, X>prod) < £■ An application of Lemma [391 (see remark 1 above) completes the 
proof. 

For part (2), we set I = min{l/e 2 ,n}. We apply Theorem [371 to the weight vector 
(wi, . . . , Wi) from the modified Claim [23] (see remark 3, above), taking r = (1/ y/n) ■ 2°( 1 / e2 \ 
Theorem [371 directly yields p r (w, X>Kwise) < 2e. An application of Lemma [391 (see remark 2 
above) completes the proof. □ 
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4.3 A poly(n) • 2 0<y1 ^ 2 3 ) bound for product distributions and iC-wise 
independent distributions. 

In this subsection we first prove a bound of n ^- 1 ^ 2 3 ) (analogous to Section [3T3|) . We then 
sketch how the arguments of [Ser07j can be extended to the product distribution and A-wise 
independent distribution settings to obtain the final result (analogous to Section fHT^T) . 

4.3.1 An n °^ e2/ ^ bound. 

With the required machinery in place it is straightforward to prove: 
Theorem 42. Let f be any n-variable threshold function. Then: 

1. f is e- approximated with respect to T> pro d by a threshold function g = sign(w ■ x — 9) 
with Wi, . . . , w n all integers of magnitude n ' 1 ^ 2 3 ). 

2. f is e- approximated with respect to £>Kwise by a threshold function g = sign(w ■ x — 6) 
with Wi, . . . , w n all integers of magnitude n°( 1//e2/3 ). 

Proof. For (1), we set i = min{l/e 2//3 , n}. We apply Theorem [331 to the weight vector 
(wfr, . . . , Wi e ) obtained from Lemma |26| taking r = l/(2n + 2) 2£+8 = n^^ 1 ^ 2 ^^. Theo- 
rem [33] gives p r ((t«i 1 , • • • ,w ie ),V pmd ) < 0(e), and Lemma [341 gives p r (u),U prod ) < 0(e). An 
application of Lemma [39] completes the proof. 

For (2), we again set £ = min{l/e 2//3 , n}. We apply Theorem [381 to the weight vector 
(w^, . . . jWit) from Lemma 1261 taking r = l/(2n + 2) 2 ^ +8 = n~ n<yl l t2/3 \ Theorem [371 directly 
yields p r {w,T> Kwise ) < 2e. An application of Lemma [391 completes the proof. □ 

4.3.2 Completing the proof: a poly(n) • 2 < -^ 1 / e2/3 ) bound. 

Finally, in this section we show how to obtain a poly(n) • 2 < ^ 1 / e2/3 ) bound for constant-biased 
product distributions and for A-wise independent distributions: 

Theorem 43. LetD be either a constant- biased product distribution or a K-wise independent 
distribution. Every n-variable threshold function f is e- approximated w.r.t. V by a threshold 
function g = sign(w ■ x — 8) with wi, . . . , w n all integers of magnitude poly(n) • 2^ 1//e2/3 ) . 

The theorem is proved following an approach analogous to Section 13.41 To do this, 
one needs to check that the ideas from [Ser07] can be appropriately generalized to product 
distributions and A-wise independent distributions. We do not present the details of the 
proofs but only briefly sketch the ingredients that make these generalizations possible. 

Fix a distribution T> in either of the aforementioned classes. Let / be a threshold function 
and e > be given. The first thing one must argue is that for an appropriate threshold 
L = 0(1/ e 2 ), if the e-critical index i is bigger than L, then / is e-close with respect to T> to 
the L-junta g obtained by truncating the smallest n — L weights. If "D is a constant-biased 
product distribution, this can be done by an analysis very similar to Case Ha in [Ser07j. 
The only difference is in constant factors that eventually lead the threshold L to increase 
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by a factor of 1/p (the bias of the distribution). If V is a i-T-wise independent distribution, 
we can no longer use the Hoeffding bound that is used in Case Ha from |Ser07j . However, 
it turns out that Chebyshev's bound can be used instead; indeed, |DGJ + 09| does precisely 
this to show that if the e-critical index is greater than L, then / is e-close to the L-junta g. 
w.r.t. any i^-wise independent distribution. 

To conclude the sketch it suffices to argue that, for every value of £, there exists an 
e-approximator for / (w.r.t. T>) with integer weights of magnitude poly(n/e) • 2°^ log ^. 
A generalization of Case lib from |Ser07] is also possible in this case. For constant-biased 
product distributions this is easy (in fact we obtain a bound of \Jn log(l/e)2°^ log£ )), because 
the Hoeffding bound and Gaussian anti-concentration still apply. For f^-wise distributions, 
again one can use the tail bounds in [B R94] for .fT-wise independent distributions in place 
of the Hoeffding bound; Gaussian anti-concentration can also be deduced in this case using 
FactES 

Now using Lemma [23 the proof follows as in Section 13.41 

4.4 Discussion: some distributions require large-weight approxi- 
mators. 

We have shown that for some non-uniform distributions such as constant-biased product 
distributions and i^-wise independent distributions, every threshold function can be e- 
approximated using integer weights at most poly(n) • 2 c ^ 1 / e2/3 ). An optimist might wonder 
whether it is possible that under any distribution V, every threshold function can be e- 
approximated with integer weights poly(ri) • 2°( 1 / e2/3 \ or perhaps n^ 1 /^ 3 ). Here we observe 
that such a strong bound cannot hold for every distribution: 

Proposition 44. There is a probability distribution T> over { — 1, l} n and a threshold function 
f such that any integer-weight threshold function that 1/ \n +2) -approximates f under!) must 
have weight 2 n ( n \ 

Proof. The function / is the "ODD-MAX-BIT" function |Bei 94j which, on input x, outputs 
(— I) 1 where % is the first index such that X{ = 1. It is straightforward to verify that / is a 
threshold function, and it is well known that any integer-weight representation of / must 
have weig ht 2 Q W (see e.g. |HV86j ). 

Anthony et al. [ABST95J give an explicit set S of n + 1 points from { — 1, l} n and show 
that any threshold function h that agrees with / on all n + 1 points in S must in fact be 
identical to / on all of {—1, l} n (the set S is said to be a "specifying set" for /). Under the 
uniform distribution on S any l/(n + 2)-approximator must be correct on all points of S, 
and hence identical to /, and the result follows. □ 

5 Conclusions and Future Work 

We have already discussed directions for future work relating to Theorem [1] in Section 12.51 
Regarding Theorem [2j we feel that our high-level approach using anti-concentration holds 
promise for substantial further progress. Significant strengthenings of Halasz's anti-concentration 
bound are known under stronger restrictions on the additive structure of the weights wi, . . . , w n , 
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see e.g. |Vu08t |TV08| . Can corresponding extensions of Lemma [261 be established, proving 
that every threshold function admits a representation with weights that have the required 
structure? Perhaps every threshold function / can be e-approximated using integer weights 
at most poly(n) • 2 polylog< - 1//e ' ) . We hope that further study of our anti-concentration based 
approach may yield such a bound. 

Acknowledgements. We thank Ryan O'Donnell for asking a question that led to Theo- 
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Appendix 



Proof of Claim [27 

Recall Claim 



Claim 1271 There exists a set of n points y^~\ . . . ,y^ E f 1 (1) U D n such that w* is the 
unique solution of the linear system {w ■ = 1 | i = 1, 2, . . . , n}, denoted (*). 

Proof. By definition, w* G K™ is a weight vector that satisfies a maximum number of con- 
straints in CV with equality. That is, there exist sGf input vectors y( %1 \ y^ 2 \ ■ ■ ■ ,y^ ia ' E 
/ _1 (1) U D n such that 

/ y {tl) \ 
,(«) 

w* = Aw* = l nxl (9) 



y 



V y {is) ) 

and no other weight vector satisfies more than s of the constraints with equality. Also note 
that for all input vectors x E UD n \ {y^}je[s] it holds w* ■ x > 1. 

Consider the linear system Aw = l nx i- By definition, w* is a solution to this system. 
We will show that the system has a unique solution or equivalently that rank(A) = n. Then 
by selecting n linearly independent rows of the matrix A, we get the linear system (*). 

Suppose, for the sake of contradiction, that rank(v4) < n. Then, there exists a non-zero 
vector that lies in the right null-space of A, i.e. there exists nx i ^ u E R" such that 
Au = sx i- Consider the family of weight vectors {w* = f w* + eu} eeK . We will argue that 
there exists eo E M* such that the vector w* o ^ w* satisfies at least s + 1 of the constraints 
of CV with equality, which is a contradiction. 

We now proceed with the argument. We have the following: 

1. For all e E R and for all j E [s] it holds w* ■ y^ = w* ■ y^ + e(u ■ y^) = 1, since 
u-yte) = 0, by (|2j). 

2. There exists at least one vector y E (/ _1 (1) U D n ) \ {y^ j '}j^[s] such that u ■ y ^ 0. This 
holds true because the set / _1 (1) U D n (in fact / _1 (1) itself) spans R n , while we are 
assuming that the rank of A is strictly less than n. (Recall that / was assumed to be 
odd, hence / _1 (1) contains either x or — x for every x E { — 1, l} n -) Let U ^ be the 
corresponding set, i.e. U = {y E / _1 (1) U D n \ {y^}je[s] \ u ■ y ^ 0}. Let us also 
denote U = {y E f-\l) U D n \ {y^} Ms] \ u ■ y = 0} for its complement. 
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We now claim that one can choose an appropriate value eo for e such that for some 



y 1 E /- x (l) U D n \ {y^} Ms] we have w% ■ y' = 1 and for all x E f-\l) U D n \ {y^} je[s] it 



holds w* ■ x > 1. The latter statement provides the desired contradiction, since, combined 
with (1) above, it implies that the corresponding vector w* is a feasible solution to CV and 
satisfies (at least) s + 1 constraints with equality - in particular, those corresponding to the 
points {y^} je[s] U {y'}. 

Partition the set U into U+ = {y E f-\l) U D n \ {y^} je[s] \ u ■ y > 0} and U- = 
U \ U + . By (2) above, we know that at least one of the sets U+ and U- is nonempty. We 
analyze the case that U + ^ 0, the case ?7_ 7^ being very similar. Recall that for every 
x E U D n \ {y^}je[s] h holds w* ■ x > 1. Now consider some x + E U + ; we have that 

w* ■ x + > 1 and u ■ x + > 0. We therefore select: 



First, it is clear that eo < 0, which implies that w* Q 7^ w*. It is also straightforward to 
verify that the remaining desired properties are satisfied. Indeed, there exists at least one 
point y' E U + C Li D n \ {y^}je[s] ~ a maximizer of CTUjl - such that w* ■ y' = 

w* ■ y' + eo(w • y ! ) = 1. Also, if x 6 (/ + , then by the definition of eo above, we have that 
1 < w* ■ x < w* ■ x. Now if x G LL, then w* Q ■ x > w* ■ x > 1. Finally, if x E U, then 
w* Q ■ x = w * ■ x > 1. Hence, we have w* o ■ x > 1 for all x E U -D„ \ {2/^ 3 ^}je[s] which 

completes the proof of Claim [271 □ 



def 

eo = max 

x + £U + 



1 — w* ■ X. 



+ 



(10) 



w • x + 
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